On Partially Observable Markov Decision Processes Using Genetic Algorithm Based Q-Learning

نویسندگان

  • HAITAO LIU
  • BINGRONG HONG
چکیده

As powerful probabilistic models for optimal policy search, partially observable Markov decision processes (POMDPs) still suffer from the problems such as hidden state and uncertainty in action effects. In this paper, a novel approximate algorithm Genetic algorithm based Q-Learning (GAQ-Learning), is proposed to solve the POMDP problems. In the proposed methodology, genetic algorithms maintain a population of policy, whereas the traditional Q-learning obtains predictive rewards to indicate fitness of the evolved policies. GAQ-Learning solves the hidden state problem using a novel hybrid method, in which the historical information is incorporated with the current belief state to find the optimal policy. The experiments conducted on benchmark datasets show that the proposed methodology is superior to other state-of-the-art POMDP approximate methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Safe Q-Learning on Complete History Spaces

In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observations and actions. A novel and sound technique for learning a Q-function on history spaces is developed and discussed. We analyze certain conditions under which a history based approach is able to learn policies compar...

متن کامل

Training a real-world POMDP-based Dialogue System

Partially Observable Markov Decision Processes provide a principled way to model uncertainty in dialogues. However, traditional algorithms for optimising policies are intractable except for cases with very few states. This paper discusses a new approach to policy optimisation based on grid-based Q-learning with a summary of belief space. We also present a technique for bootstrapping the system ...

متن کامل

Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes

Satinder Singh AT &T Labs-Research 180 Park Avenue Florham Park, NJ 07932 [email protected] Partially Observable Markov Decision Processes (pO "MOPs) constitute an important class of reinforcement learning problems which present unique theoretical and computational difficulties. In the absence of the Markov property, popular reinforcement learning algorithms such as Q-Iearning may no long...

متن کامل

Solving POMDP by On-Policy Linear Approximate Learning Algorithm

This paper presents a fast Reinforcement Learning (RL) algorithm to solve Partially Observable Markov Decision Processes (POMDP) problem. The proposed algorithm is devised to provide a policy-making framework for Network Management Systems (NMS) which is in essence an engineering application without an exact model. The algorithm consists of two phases. Firstly, the model is estimated and policy...

متن کامل

Active gesture recognition using partially observable Markov decision processes

We present a foveated gesture recognition system that guides an active camera to foveate salient features based on a reinforcement learning paradigm. Using vision routines previously implemented for an interactive environment, we determine the spatial location of salient body parts of a user and guide an active camera to obtain images of gestures or expressions. A hiddenstate reinforcement lear...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007